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Abstract. 

This paper is about Information Geometry, a relatively new subject within 
mathematical statistics that attempts to study the problem of inference by using 
tools from modern differential geometry. This paper provides an overview of some 
of the achievements and possible future applications of this subject to physics. 
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1. Introduction 

It is not surprising that geometry should play a fundamental role in the theory of 
inference. The very idea of what constitutes a good model cannot be stated clearly 
without reference to geometric concepts such as size and form of the model as well 
as distance between probability distributions. Recall that a statistical model (hy- 
pothesis space) is a collection of probability distributions for the data. Therefore, 
a good model should be big enough to include a close approximation to the true 
distribution of the data, but small enough to facilitate the task of identifying this 
approximation. As Willy said: as simple as possible but not too simple. 

But there is more. Regular statistical models have a natural Riemannian struc- 
ture. Parameterizations correspond to choices of coordinate systems and the Fisher 
information matrix in a given parameterization provides the metric in the corre- 
sponding coordinate system By thinking of statistical models as manifolds, 
hypothesis spaces become places and it doesn't take much to imagine some of these 
places as models for the only physical place there is out there, namely: spacetime. 
In section ^ of the paper we apply the techniques of information geometry to show 
that the space of radially symmetric distributions admits a foliation into pseudo- 
spheres of increasing radius. If we think of a radially symmetric distribution as 
describing an uncertain physical position we discover a hypothesis space that, in 
many ways resembles an expanding spacetime. An isotropic, homogeneous space 
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with pseudo-spherical symmetries and with time increasing with decreasing curva- 
ture radius. This admittedly simple toy model of reality already suggests a number 
of truly remarkable consequences for the nature of spacetime. Here are three of 
them: 

1. The appearance of time is a consequence of uncertainty. 

2. Space is infinite dimensional and only on the average appears as four dimen- 
sional. 

3. Spin is a property of space and not of a particle so that all truly fundamental 
particles must have spin. 

I must emphasize that, at the time of writing, there is no direct experimental 
evidence in favor of any of the above statements. Nevertheless there is indirect 
evidence that they should not be too quickly dismissed as nonsense. 

With respect to the first statement. Recall that the appearance of the axis of 
time, in standard general relativity, is a consequence of specifying an initial and 
a final 3-geometry on two spacelike hypersurfaces plus evolution according to the 
field equation j| . Time is therefore a consequence of 3-space geometry and the field 
equation of general relativity, which in turn seems to be of an statistical nature 
(see |J] an d section |] below). These, I believe, are facts that support, at least in 
spirit, the first claim above. 

There is absolutely no evidence that space has infinitely many dimensions, but 
had this be true, it would explain why we observe only four of them. It also seems 
a priori desirable to have a model that produces observed space as a macroscopic 
object not unlike pressure or temperature. 

With respect to the third statement. Hestenes|| shows that many of the rules 
for manipulating spin have nothing to do with quantum mechanics but are just 
general expressions for the geometry of space. It is also worth noticing that the 
standard model allows the existence of elementary particles without spin but these 
have not yet been observed. 

But there is still more. Think of the different roles that the concepts of, en- 
tropy, curvature, and, local hyperbolicity, play in statistics and in physics and you 
will realize that the link is a useful bridge for transporting ideas from physics to 
statistics and vice- versa. The following sections (^, [|, ||) of this paper do exactly 
that. That is, they examine the meaning of each of these concepts (entropy, cur- 
vature and local hyperbolicity) keeping the proposed link between inference and 
physics in mind. 

The link between information geometry and general relativity promises big re- 
wards for both, statistical inference and physics. For example, statisticians may 
look at the field equations of general relativity as a procedure for generating sta- 
tistical models from prior information encoded in the distribution of energy and 
matter fields. On the other hand, physicists may see information geometry as a 
possible language for the elusive theory of quantum gravity since it is a language 
already made out of the right ingredients: uncertainty and differential geometry. 
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2. The Hypothesis Space of Radially Symmetric Distributions 



Let 1Z 3 be the collection of all radially symmetric distributions of three dimensional 
euclidean space. The probability of an infinitesimal region around the point x € 
1R 3 , of volume d 3 x, that is assigned by a general element of 71% is given by, 



P(d 3 x\ip,6,a) 



1 



d A x 



(1) 



where, 8 £ H 3 is a location parameter, a > is a scale parameter and ip is an 
arbitrary differentiable function of r 2 > satisfying the normalization condition: 



f°° 1 
J r 2 \^(r 2 )\ 2 dr = — 



(2) 



Equation (|^) assures that the probability assigned to the whole space by ([!]) is 
in fact 1. The derivative, ip , must also decrease to sufficiently fast so that 
the integrals (^|) exist. Since ip is an infinite dimensional parameter, IZ3 is also 
an infinite dimensional manifold but the space , 7^.3 (ip), of radially symmetric 
distributions for a given function ip is a four dimensional submanifold of 
parameterized by (9q, 9\, #2, #3) = (cr,9). The metric in IZ3 (ip) is given by the 
4x4 Fisher information matrix (see H p. 63) with entries: 



.9^ = 4 (dj)(d„f)d 3 x 



(3) 



where fi 7 v = 0, . . . , 3, the function / is the square root of the density given in (|l|) 
i.e., 



f(x\9,a)=a' 3 ^ 







x-6 


') 


a 





(4) 



and denotes the derivative with respect to 9^ . Let us separate the computation 
of the metric tensor terms into three parts. The entries 3^, the entries got for 
i,j = 1,2,3 and the element 300- Replacing (0) into (||), doing the change of 
variables x = 9 + ay and using the fact that 



we get, 



5^(y 2 ) = -2^'(y 2 )/a 

16 f / ' \ 2 
9ij = -5 / ViVj (I* (y 2 )) d 3 y 



(5) 
(6) 



where y 2 = \y\ 2 is the Clifford product of the vector y by itself. Carrying out the 
integration in spherical coordinates we obtain, 



and, 



gij = for i ^ j 



(7) 
(8) 
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The derivative with respect to a of the function given in (|4|) is, 



9 / 



3-0 + iy ip 



and therefore from this and (||) we have, 

.90,: = 4 / (d Q f)(dJ)d 3 x <x I [3V> + 4 y y]^'d 3 y = 



(9) 



(10) 



where the value of for the last integral follows by performing the integration in 
spherical coordinates or simply by symmetry, after noticing that the integrand is 
odd. Finally from (j|) we get, 



4tt 



9oo 



[3i/>(r )+ 4r V' {r )} r 2 dr 



(11) 



Expanding the square and integrating the cross term by parts to show that, 

3, 1 



4 Air' 



(12) 



where we took u — tpr 3 /2 and v = 2rip for the integration by parts and we have 
used (0). We obtain, 



4-7T 

goo — — r 

G 



^ + 16 / r 6 |0'(r 2 )| 2 dr 



(13) 



The full matrix tensor looks like this, 



(.9) = - 



J{ip) 

K(i>) 

K(<ip) 

K(ip) 



(14) 



where J(ip) and if(V') are just short hand notations for the factors of in ( |l3|) 
and (||). These functions are always positive and they depend only on tp. Straight 
forward calculations, best done with a symbolic manipulator like MAPLE, show 
that a space with this metric has constant negative scalar curvature given by 
— 1/J(ip). It follows that for a fix value of the function ip the hypothesis space of 
radially symmetric distributions IZ3 (ip) is the pseudo-sphere of radius J 1 / 2 (0). 
We have therefore shown that the space of radially symmetric distributions has a 
foliation (i.e. a partition of submanifolds) of pseudo-spheres of increasing radius. 

This is a mathematical theorem. There can be nothing controversial about it. 
What it may be disputed, however, is my belief that the hypothesis space of radially 
symmetric distributions may be telling us something new about the nature of real 
physical spacetime. What I find interesting is the fact that if we think of position 
subject to radially symmetric uncertainty then the mathematical object describing 
the positions (i.e. the space of its distributions) has all the symmetries of space 
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plus time. It seems that time, or something like time, pops out automatically when 
we have uncertain positions. I like to state this hypothesis with the phrase: 
there is no time, only uncertainty 



2.1. UNCERTAIN SPINNING SPACE? 

The hypothesis space of radially symmetric distributions is the space of distribu- 
tions for a random vector y £ ]R 3 of the form, 

y = x + e (15) 

where x € K 3 is a non random location vector, and e £ 1R 3 is a random vector with 
a distribution radially symmetric about the origin and with standard deviation 
it > in all directions. It turns out that exactly the same hypothesis space is 
obtained if instead of (|l5| ) we use, 

y = x + i e (16) 

where i is the constant unit pseudo scalar of the Clifford algebra of H 3 . The pseudo 
scalar i has unit magnitude, commutes with all the elements of the algebra, squares 
to — 1 and it represents the oriented unit volume of H 3 |?J . By taking expectations 
with the probability measure indexed by (x, a, ip) we obtain that, 

E(y\x, a, tp) = x (17) 

and, 

E(y 2 \x,a,iP) =x 2 -o 2 (18) 

Equation ( |l8| ) shows that, even though the space of radially symmetric distri- 
butions is infinite dimensional, on the average the intervals look like the usual 
spacetime intervals. 

We may think of y in ( ^6|) as encoding a position in 3-space together with an 
uncertain degree of orientation given by the bivector part of y, i.e. ie. In other words 
we assign to the point x and intrinsic orientation of direction e and magnitude |e|. 
In this model the uncertainty is not directly about the location x (as in (|l5|)) but 
about its postulated degree of orientation (or spinning) . 



3. Entropy and Ignorance 

The notion of statistical entropy is not only related to the corresponding notion 
in physics it is exactly the same thing as demonstrated long ago by Jaynes H. 
Entropy appears indisputable as the central quantity of information geometry. In 
particular, from the Kullback number (relative entropy) between two distributions 
in the model we obtain the metric, the volume element, a large class of connections 
1 2 ] , and a notion of ignorance within the model given by the so called entropic priors 
1 9]. In this section I present a simple argument, inspired by the work of Zellner on 
MDIP priors |l(J , showing that entropic priors are the statistical representation of 
the vacuum of information in a given hypothesis space. 
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Let H = {f{x\6) : 9 e 0} be a general regular hypothesis space of probability 
density functions f(x\9) for a vector of observations x conditional on a vector of 
parameters 9 = (9^). Let us denote by f(x, 9), the joint density of x and 9 and by 
f(x) and n(9) the marginal density of x and the prior on 9 respectively. We have, 

f(x,9) = f(x\9)w(6) (19) 

Since H. is regular, the Fisher information matrix, 

S M „(0)=4 ( (d^iduf'^dx (20) 



exists and it is continuous and positive definite (thus non singular) at every 9. As 
in (||), dfj, denotes the partial derivative with respect to 9^. The space H with 
the metric g — (g^u) given in ( p(i| ) forms a Riemannian manifold. Therefore, the 
invariant element of volume is given by, 



r)(d6) cx Vdet g(9)d9 (21) 

This is in fact a differential form |n], p. 166] that provides a notion of surface area 
for the manifold TL and it is naturally interpreted as the uniform distribution over 
TL . Formula (|2l|), known as Jeffeys rule, is often used as a universal method for 
building total ignorance priors. However, (^) does not take into account the fact 
that a truly ignorant prior for 9 should contain as little information as possible 
about the data x. The entropic prior in TL demands that the joint distribution of 
x and 9, f(x,9), be as difficult as possible to discriminate from the independent 
model h(x)-^/det g(9), where h(x) is an initial guess for f(x). That is, we are 
looking for the prior that minimizes the Kullback number between f(x, 9) and the 
independent model, or in other words, the prior that makes the joint distribution 
of x and 9 to have maximum entropy relative to the measure h(x) a/ det g(9)dxd9. 
Thus, the entropic prior is the density tt(9) that solves the variational problem, 

f^6)] og —J^ = —dxdB (22) 
h(x) v / det g(9) 

Replacing (|l9|) into (|22]), simplifying, and using a lagrange multiplier, A, for the 
normalization constraint, that J ir(9)d9 = 1, we find that ir must minimize, 

n(6)I(9 :h)d9 + J ir(9) log ~ /==^ d9 + X J ir(9) d9 (23) 
where, 1(9 : h) denotes the Kullback number between f(x\9) and h(x), i.e., 

1(9 :h) = J f(x\9) log d0 (24 ) 

The Lagrangian C is given by the sum of the integrands in (|2^) and the Euler- 
Lagrange equation is then, 



mm 

7T 



(25) 
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from where we obtain that, 



tt(0) <xe~ I( - 9 ■ h )y/det g(6) 



(26) 



The numerical values of the probabilities obtained with the formula (|2^) depend on 
the basis for the logarithm used in (|22|). However, the basis for the logarithm that 
appears in the definition of the Kullback number is arbitrary (entropy is defined 
only up to a proportionality constant). Thus, (26) is not just one density, but a 
family of densities, 

tt(%, h) oc e - aI( - 6 : h ^ ^det g{9) (27) 

indexed by the parameter a > and the function h. Equation ( p7|) is the family 
of entropic priors introduced in Q and studied in more detail in |L2|,@ and Q. 

It was shown in Q that the parameter a should be interpreted as the number 
of virtual observations supporting h(x) as a guess for the distribution of x. Large 
values of a should go with reliable guesses for h(x) but, as it was shown in [jl3|, the 
inferences are less robust. This indicates that ignorant priors should be entropic 
priors with the smallest possible value for a, i.e., with, 



a 



inf{a > : J e"" 1 ^' ^ r/(d<9) < oo} 



(28) 



Here is the canonical example. 



3.1. EXAMPLE: THE GAUSSIANS 

Consider the hypothesis space of one dimensional gaussians parameterized by the 
mean fi and the standard deviation a. When h is an arbitrary gaussian with 
parameters fio and ao straight forward computations show that the entropic prior 
is given by, 



7r(/i,oia!,/io,0o) 



1 



a a - 2 exp 



« ( (M - A*o) + 



where the normalization constant Z is defined for a > 1 and is given by, 

2 /aW2 a~J_ , _ a 



(29) 



(30) 



Thus, in this case a* = 1 and the most ignorant prior is obtained by taking the 
limit ol — ► 1 and ao — > oo in ( ^9| ) obtaining, in the limit, an improper density 
proportional to 1/er, which makes every body happy, frequentists and bayesians 
alike. 



4. Curvature and Information 



Curvature seems to be well understood only in physics, specially from the point 
of view of gauge theories where the curvature form associated to a connection has 
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been shown to encode field strengths for all the four fundamental forces of nature 
fl4| . In statistics, on the other hand, the only thing we know (so far) about the 
role of curvature is that the higher the scalar curvature is at a given point of the 
model, the more difficult it is to do estimation at that point. This already agrees 
nicely with the idea of black holes , for if in a given model there is a curvature 
Rq beyond which estimation is essentially impossible then the space is partitioned 
into three regions with curvatures, R < Rq, R — Rq and R > Rq that correspond 
to regular points, horizon points and points inside black holes. No body has found 
an example of a hypothesis space with this kind of inferential black holes yet, 
but no body has tried to look for one either. Before rushing into a hunt it seems 
necessary to clarify what exactly it is meant by the words: estimation is essentially 
impossible at a point. 

I believe that one of the most promising areas for research in the field of 
information geometry is the clarification of the role of curvature in statistical 
inference. If indeed physical spacetime can be best modeled as a hypothesis space 
then, what is to be learned from the research on statistical curvature will have 
direct implications for the nature of physical space. On the other hand, it also 
seems promising to re-evaluate what is already known in physics about curvature 
under the light of the proposed link with inference. Even a naive first look will 
show indications of what to expect for the role of curvature in inference. Here is 
an attempt at that first look. 

From the classic statement: Mass-energy is the source of gravity and the strength 
of the gravity field is measured by the curvature of spacetime 

We guess: Information is the source of the curvature of hypothesis spaces. That 
is, prior information is the source of the form of the model 

From: The dynamics of how mass-energy curves spacetime are controlled by the 
field equation: 

G = kT (31) 

where G is the Einstein tensor, T is the stress-energy tensor and k is a 
proportionality factor 
guess: The field equation controls the dynamics of how prior information produces 
models 

From: The field equation for empty space is the Euler- Lagrange equation that 
characterizes the extremum of the Hilbert action, with respect to the choice of 
geometry. That is it extremizes 

S g = J Rdfl, dfl = ^|det 5 |d 4 x, (32) 

where the integral is taken over the interior of a four- dimensional region fl, 
R is the scalar curvature and g is the metric 
guessl: The form of hypothesis spaces based on no prior information must satisfy 

Rij ~ \r 9ij = (33) 

where gij is the Fisher information matrix, R^ is the Ricci tensor and R is 
the scalar curvature as above. 
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guess2: Given a hypothesis space with Fisher information matrix g{0), the Ein- 
stein tensor, G, i.e. the left hand side of fis^j), quantifies the amount of prior 
information locally contained in the model at each point 9. 



5. Hyperbolicity 

What it seems most intriguing with respect to the link between information ge- 
ometry and general relativity is the role of hyperbolicity. We know from general 
relativity that physical spacetimes are Riemannian manifolds which are locally 
Lorentzian. That is, at each point, the space looks locally like Minkowski space. 
Or, in other words, the symmetries of the tangent space at each point are those 
of hyperbolic space. On the other hand, in information geometry, hyperbolicity 
appears at two very basic levels. First, hyperbolicity appears connected to the 
notion of regularity through the property of local asymptotic normality (LAN for 
short see [||). This is in close agreement with what happens in physics. The LAN 
property says that the manifold of distributions of n independent and identically 
regularly distributed observations can be locally approximated by gaussians for 
large n, and since the gaussians are known to form hyperbolic spaces, the cor- 
respondence with physics is perfect. Second, in statistical inference hyperbolicity 
also appears mysteriously connected to entropy and Bayes' theorem! (see my From 
Euclid to Entropy^^ ) and by following the link back to general relativity we ob- 
tain a completely new and unexpected result: entropy and Bayes theorem are the 
source of the local hyperbolicity of spacetimel. That entropy and thermodynamics 
are related to general relativity may have seem outrageous in the past, but not 
today. It does not seem outrageous at all when we consider that, Bekenstein found 
that the entropy of a black hole is proportional to its surface area |r| , when we 
consider that Hawking discovered that black holes have a temperature jlTj and 
specially when we consider that Jacobson showed that the field equation is like an 
equation of state in thermodynamics Q . 
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